Longest - match String Searching for Ziv – Lempel Compression timothy
نویسندگان
چکیده
SUMMARY Ziv–Lempel coding is currently one of the more practical data compression schemes. It operates by replacing a substring of a text with a pointer to its longest previous occurrence in the input, for each coding step. Decoding a compressed file is very fast, but encoding involves searching at each coding step to find the longest match for the next few characters. This paper presents eight data structures that can be used to accelerate the searching, including adaptations of four methods normally used for exact matching searching. The algorithms are evaluated analytically and empirically, indicating the trade-offs available between compression speed and memory consumption. Two of the algorithms are well-known methods of finding the longest match—the time-consuming linear search, and the storage-intensive trie (digital search tree). The trie is adapted along the lines of a PATRICIA tree to operate economically. Hashing, binary search trees, splay trees and the Boyer–Moore searching algorithm are traditionally used to search for exact matches, but we show how these can be adapted to find longest matches. In addition, two data structures specifically designed for the application are presented.
منابع مشابه
Longest-match String Searching for Ziv-Lempel Compression
Ziv-Lempel coding is currently one of the more practical data compression schemes. It operates by replacing a substring of a text with a pointer to its longest previous occurrence in the input, for each coding step. Decoding a compressed le is very fast, but encoding involves searching at each coding step to nd the longest match for the next few characters. This paper presents eight data struct...
متن کاملFast Multi-Match Lempel-Ziv
One of the most popular encoder in the literature is the LZ78, which was proposed by Ziv and Lempel in 1978. After the original paper was published, many variants of the LZ78 were proposed to improve its performance. Simulation results have shown that the well known LZW version has an improvement around 10%. Given a sequence u ∈ A, where A is the source alphabet, all versions of LempelZiv parse...
متن کاملA General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text
We address in this paper the problem of string matching on Lempel-Ziv compressed text. The goal is to search a pattern in a text without uncompressing. This is a highly relevant issue, since it is essential to have compressed text databases where eecient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts th...
متن کاملFast Relative Lempel-Ziv Self-index for Similar Sequences
Recent advances in biotechnology and web technology are generating huge collections of similar strings. People now face the problem of storing them compactly while supporting fast pattern searching. One compression scheme called relative Lempel-Ziv compression uses textual substitutions from a reference text as follows: Given a (large) set S of strings, represent each string in S as a concatena...
متن کاملLempel-Ziv factorization: Simple, fast, practical
For decades the Lempel-Ziv (LZ77) factorization has been a cornerstone of data compression and string processing algorithms, and uses for it are still being uncovered. For example, LZ77 is central to several recent text indexing data structures designed to search highly repetitive collections. However, in many applications computation of the factorization remains a bottleneck in practice. In th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993